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Stepl 
Step2 
Step3 
Step4: 



How Rings assign address space 

ati^ircormng address to self (to some power of 2) 
asagpthe resUt to self address 

next_addr = selfaddr + self_addr_space, // number of regster usedlocatly 
send dawn next addr 



Example: 

Dim needs 16 adds 
Uart needs 4 
Inter needs 256 



Erunreiaie ms 
Addr =8 



self= 16 



self = 32 



Addr = 36 self =256 




Addr =512 
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5 



H2 



clkl 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



< 



n> 



d 12. (io. 



compound A compound B 



clock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring- interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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rlnck opposes data 

this arrangement has the advantage 
of auto ensuring the no race 
••condition (at least in this simple 
data_b C ase) exists 



compound 




data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



f*9 
7 




^ clkb 

compound A 



compound B 
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Qb 



if s° 


latch 


d 


Mb 


d - 


A 




A 




^5 
2" 



data 



clock 



clock 




clka 



clkb 
data_a 



clock 




icertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 



APP ID= 10064343 



Page 233 of 280 



O ?' O ;3 O 2 



data 





clock 



clock 



. lo 



ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b". 



big module 



F'9 



locaLclock 

local data_out 



lit 



data_in 



data from 



previous member 



elk 




t 



ring interface 



-^-data 



clock 



local clock lags behind 
nng_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b*\ ring works 
fine. However if most of the traffic is from "c" to "b", 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to *V In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
DI it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to DI . By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to DI is treated directly. 



l4t> 3 near ^ I** 




Enumeration is started by "Anchor" 
which assigns address= 1 to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses . as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 



F>3 
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msgl and msg2 arrive at the same time, 
the bridge end must make a decision 
which message to forward first. 

It can be shown that unwise decision can 
lead to freezout, deadlock and option price 
dropping to 5$. 

Therefore MSG2 gets the priority. 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 

So we have exactly one enforcer for each 
ring. 



3 near 




In land bridge ring, the situation is trickier. If V2 
send message to address==5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 
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zs 





ring chip 
scan added 



Z30 



ring out 



scan insert module nn 8 in 




insert scan 

in pads 



J-2 3<f 



R 3 . 




msg_typc 
msg_addr 

msg_data(63:8) not used for scan 
msg_data(7:0) , used for scan chains 



clock 
scan mux 
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2*3 





? type^ 




berA 


2J0 addr^/*^ 


<s 


mem 


64 data^l* 


meml 




' ok 






scan_test 






reset r 






clkj?) 












26 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



R 3 . 2-7 



imsg iok 



omsg 



ook 



dmsg 



dok 




umsg 



uok 
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member A 



imsgiok onug ook 



member B 

imsgiok omsg ook 




7.10 



omsg 



uok 



The incoming messages are examined first 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during 
enumeration protocol. The lower part of this 
register is always masked,. Hopefully 
synthesis will delete the unused bits 
implementation. 



0 




( comparator^ 



incoming 
address 

split mask 

self address 
register 



dont care part «t 
of self-address *V 



ours/through part of the address 

that enters the member 



F«5- 2-1 



APP ID= 



10064343 



Page 243 of 280 



1, 0 Gi fo «4 3; H-rs ., Q"7 O 53 O E: 




30O 



APPJD= 10064343 Page 244 of 280 





APP ID= 10064343 



Page 245 of 280 



JL -nO-tii 4.3 «+3 . O "7 JO 2 Cii E 




APP_ID=1 0064343 Page 246 of 280 




AO O W3 ,„ 070 H O.S 




APP ID= 10064343 



Page 247 of 280 



:l.-0 lift. *+3 M-jS ,» S3 "7 



OEOS! 




the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
penmiter. 



33 + 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3k 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

3if7 



Internal Memory 
Fast, Unified, Multi-port 







Vobla \flL 

Network Processor 







Peripheral Expansion 
Enet, ATM. Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 

-*>LC> 



2>So 
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I 



Fetch Ifnii 

FTU 



^1 



Internal Memory 



r 



lnurit 



$1* 



LSU 

HI 



Program Sctfuencvr 

PSU 



lmxn. 



Vobla Core 




E2T 



PrckxjdAl Plump 

PBU 



* RFU 



Arilhmelx: 

DAL.U 



Imm, 



S 



SJV'J 



.ifst 



Agent l/F 

AG1 



" Agent Una 



Core Debug 



Dixirbcll 



VRi 



DMA Agent 



Timer* 



Olbc» 



! menu) Wiidd l/F 



Vobla Compound* 



31 



APP ID= 10064343 



Page 249 of 280 



1 O 0 iS H 3 11 -K5 ... O '7' O 12 11 E! 



Cinrait I ask I T«ik_X| Next I ask | T»»k_V 

(CT1D) "T"^ l„ Memory 



3& 




Destination 



Active 

(Active 
Reg. File) 



After a task switch 



^^__-/ I Source OperaAj 

of I ask X \ 



Current Task |T»»k_Y | 



(CUD) 



Memory < N '«»> 



Next Task J Ta»k_Z 



> r ^N v J Write 
Q^B ,tr V* Queue 

r — 



Dcstinatum 



Ta* 



Active 

(Active 
Kck- Ktle) 




Shadtml 

(Active 
Shadmv ) 



1 



Stu\ 



I Source Operand 
*ofTa*k Y 




Preload regs 
of Task Y 



3^o 



Preload regs 



3? 2. 



Shadow 2 

\ Preload 
Shadim) 
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4 



Other data -4 




Internal memory 



Peripheral 
Flfo 



strnm data In 



External memory 

* t 

HP task chain 




"The ayatem" 



"External world" 
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Internal memory 



♦ 



Peripheral 
Flfo 



t 



stream data out 



32 register* of 
32 bits per ta*k, 

A set of m dtcabon* 
per ta&k. which 
control taOc execution 
scheduling 

An interface to 
adjacent resources 

font memory assessed 
by !oad/*tore 
instruction* 



General 

purpose 
register* 



UiwbciU 



Agent 
interface 



Internal 
memo<ry 



Special 

purpose 

registers 



MP iMtftfigurauon 
agisters 



External 
memory 



pi 0 ' ^° 



per task and 
global register* 



Initialized by 
the PP 



Big area accessed 
via a DMA inter race 
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Rl register: 




s - sticky bit 

- equal.' /cro 
It - U;ss the*vneg3itve 

c- carry 

mb - rcflcc<icii!i of the RAM mufcn-readcr busy indication. 



, t 2 2 2 2 2 2 2 2 2 2 1 I I I I I I l 1 I »»7AS4 12 I 0 



TASK. SPR 



SrM RH* h TCH 



R*rh"UU 



l**i / ?6»43210»S"»*5 4 l 2 J ft 




IXXlR 
HELL 




1 



irap 

t«pr tnde* ~2> 



i > 2 2 
1 O * 8 



2 2 2 2 
6 * 4 1 



? h 3 4 J 2 t ti 



MIMM-XSTR 
iwsr tndn -3) 



|p||jl 



lliilst^lli 
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I 
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Frame structure 
of an example 
task type 



13 



til . 



common 
task data 


task 

fragment 1 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of all icvel2 functions 


level 1 n 
data 


Icvcll 12 
data 


icvcii n 

data 







4 

\ sizcoflcvclO 
I frame part is 
I different for 
■ each task 

i type 

4 sizcof lcve!2 
! frame part is 
* constant 



size of level I 
frame part is 
different per 
each task 
type 





tfiMrruM mufriory 










data w*7 d 






j \™y 
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PC 



FAB 



I'DB 



FETCH DECODE ADDRESS EXECUTE WRITE 



DECODELQCIC 
DECODE Ft 



ADDRESS ALU 



DAB 



LDB 




SDB 



RF IN MUX 



DATA ALU 



REGISTER FILE 



CC 



SPR 



▼ 

To mcmor 



ed dati> » 




Z> 



<C - -Flip Flop 



- Logic 
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CRC 



snoop 



lasOnframe 
last in transfei 



* crc_stall 
calculate crt 



mem stall 




sourcc_add[ 15:0 



memory 
interface 

&data a 

packer M 
aligner | 



c 

3 

c 



input 




request 


message 




fifo 


decoder 







data out[63:0 



irt[63:(^ 



fdes adJfk or >ul P m 



data out counter 



ie:»sage 
encoder 



type_ in 
rug interface_addrcss( 1 5 .0 
!^jmcTfac*_data[3 1 :0J 



type out[7:0](sizeUCRC) 
addrCsToutt23:0] 



data out [63:0] 
(alscQo C RC) 

ok2dn*v c 
"5 reset 



ft 



memory data 



bata[add01bauradd11bateradd2lidataradd3^ata[adci4lbateraddS](data[add6Tld^ataradd7 
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RA 
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Vobla 



agent interface 
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entry 
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entry 
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message 
encoder 
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agent command 
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RA 
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4° 
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